Perfect Pipelining: A New Loop Parallelization Technique
نویسندگان
چکیده
Parallelizing compilers do not handle loops in a satisfactory manner. Fine-grain transformations capture irregular parallelism inside a loop body not amenable to coarser approaches but have limited ability to exploit parallelism across iterations. Coarse methods sacriice irregular forms of parallelism in favor of pipelining (overlapping) iterations. In this paper we present a new transformation, Perfect Pipelining, that bridges the gap between these ne-and coarse-grain transformations while retaining the desirable features of both. This is accomplished even in the presence of conditional branches and resource constraints. To make our claims rigorous, we develop a formalism for parallelization. The formalism can also be used to compare transformations across computational models. As an illustration, we show that Doacross, a transformation intended for synchronous and asynchronous multiprocessors, can be expressed as a restriction of Perfect Pipelining.
منابع مشابه
Second - level Instruction Cache Thread Processing Unit Thread Processing Unit Thread Processing Unit Instruction Cache First - level First - level First - level Instruction Cache Instruction Cache Execution
This paper presents a new parallelization model, called coarse-grained thread pipelining, for exploiting speculative coarse-grained parallelism from general-purpose application programs in shared-memory multiprocessor systems. This parallelization model, which is based on the ne-grained thread pipelining model proposed for the superthreaded architecture 11, 12], allows concurrent execution of l...
متن کاملSoftware pipelining of nested loops for real-time DSP applications
Modem DSP Processors have been integrated with InsrrucrionLevel Purullelism(ILP), which presents a challenge to exploit ILP within DSP applications. Software Pipelining is an efficient tcchnique used to expose ILP for loop programs and has been widely used for current microprocessors. It has been recently used in DSP compilers, but only for the innermost loops. This paper proposes a new approac...
متن کاملAsap: Automatic Speculative Acyclic Parallelization for Clusters
While clusters of commodity servers and switches are the most popular form of large-scale parallel computers, many programs are not easily parallelized for clusters due to high internode communication cost and lack of globally shared memory. Speculative Decoupled Software Pipelining (Spec-DSWP) is a promising automatic parallelization technique for clusters that speculatively partitions a loop ...
متن کاملA high performance parallelization scheme for the Hessenberg double shift QR algorithm
We propose a new parallelization scheme for the Hessenberg double shift QR algorithm. Our scheme allows software pipelining and communication latency hiding, and gives almost perfect load balance. An asymptotic parallelizing overhead analysis shows that our scheme attains the best possible scalability of the double shift QR algorithm, and that the overheads are less than the multishift algorith...
متن کاملSoftware Pipeliner: Parallelization of Loops
Software pipelining, as an important parallel technique for loop structure, exploits the parallelism present among the iterations of a loop by overlapping the execution of successive iterations. This paper presents a practical and usable algorithm, Overlapping Modulo Scheduling(OMS), which is capable of modulo scheduling loops subjected to recurrence dependences and resource constraints for rea...
متن کامل